Mining Gene Expression Data Using PCA Based Clustering

نویسنده

  • N. P. Gopalan
چکیده

As the amount of laboratory data in molecular biology and bioinformatics grows exponentially in each year due to advanced technologies such as DNA Microarray, new efficient and effective clustering methods must be developed to process this fast growing amount of biological data. Numerous clustering techniques have been applied in the analysis of gene expression data to extract biologically significant patterns. But there are issues like clustering quality, high dimensionality of input data and computational efficiency need to be addressed. A novel hybrid clustering algorithm is proposed, which is a blend of Principal Component Analysis (PCA) and the enhanced correlation based clustering. PCA is a classical statistic technique for finding patterns in data of high dimension. The empirical results show that this approach provides more stable clustering performance in terms of quality and efficiency. The resulting clusters offer potential insight into gene function, molecular biological processes and regulatory mechanisms.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Privacy Preserving Based on PCA Transformation Using Data Perturbation Technique

Maintain confidentiality, privacy and security research in data mining (PPDM) is one of the biggest trends. Recent advances in data collection, data dissemination and related technologies have inaugurated a new era of research where existing data mining algorithms should be reconsidered from a different point of view, this of privacy preservation. We propose a simple PCA based transformation ap...

متن کامل

Modification of the Fast Global K-means Using a Fuzzy Relation with Application in Microarray Data Analysis

Recognizing genes with distinctive expression levels can help in prevention, diagnosis and treatment of the diseases at the genomic level. In this paper, fast Global k-means (fast GKM) is developed for clustering the gene expression datasets. Fast GKM is a significant improvement of the k-means clustering method. It is an incremental clustering method which starts with one cluster. Iteratively ...

متن کامل

خوشه‌بندی داده‌های بیان‌ژنی توسط عدم تشابه جنگل تصادفی

Background: The clustering of gene expression data plays an important role in the diagnosis and treatment of cancer. These kinds of data are typically involve in a large number of variables (genes), in comparison with number of samples (patients). Many clustering methods have been built based on the dissimilarity among observations that are calculated by a distance function. As increa...

متن کامل

NEC for Gene Expression Analysis

Aim of this work is to apply a novel comprehensive machine learning tool for data mining to preprocessing and interpretation of gene expression data. Furthermore, some visualization facilities are provided. The data mining framework consists of two main parts: preprocessing and clustering-agglomerating phases. To the first phase belong a noise filtering procedure and a non-linear PCA Neural Net...

متن کامل

Taxonomically Clustering Organisms Based on the Profiles of Gene Sequences Using PCA

The biological implications of bioinformatics can already be seen in various implementations. Biological taxonomy may seem like a simple science in which the biologists merely observe similarities among organisms and construct classifications according to those similarities, but it is not so simple. By applying data mining techniques on gene sequence database we can cluster the data to find int...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012